skip to main content
10.5555/2484920.2484942acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Speeding-up reinforcement learning through abstraction and transfer learning

Published:06 May 2013Publication History

ABSTRACT

We are interested in the following general question: is it possible to abstract knowledge that is generated while learning the solution of a problem, so that this abstraction can accelerate the learning process? Moreover, is it possible to transfer and reuse the acquired abstract knowledge to accelerate the learning process for future similar tasks? We propose a framework for conducting simultaneously two levels of reinforcement learning, where an abstract policy is learned while learning of a concrete policy for the problem, such that both policies are refined through exploration and interaction of the agent with the environment. We explore abstraction both to accelerate the learning process for an optimal concrete policy for the current problem, and to allow the application of the generated abstract policy in learning solutions for new problems. We report experiments in a robot navigation environment that show our framework to be effective in speeding up policy construction for practical problems and in generating abstractions that can be used to accelerate learning in new similar problems.

References

  1. C. Boutilier, R. Dearden, and M. Goldszmidt. Stochastic dynamic programming with factored representations. Artificial Intelligence, 121(1--2):49--107, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. D. Castro, A. Tamar, and S. Mannor. Policy gradients with variance related risk criteria. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), pages 935--942, New York, NY, USA, 2012. Omnipress.Google ScholarGoogle Scholar
  3. V. F. da Silva, F. d. A. Pereira, and A. H. R. Costa. Finding memoryless probabilistic relational policies for inter-task reuse. In Proc. 14th Int. Conf. on Information Processing and Management of Uncertainty - IPMU'12, volume 298, pages 107--116. Springer Berlin Heidelberg, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  4. T. Degris, M. White, and R. Sutton. Off-policy actor-critic. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), abs/1205.4839, New York, NY, USA, 2012. Omnipress.Google ScholarGoogle Scholar
  5. M. Deisenroth and C. Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 465--472, New York, NY, USA, 2011. ACM.Google ScholarGoogle Scholar
  6. F. Fernández, J. Garcıa, and M. Veloso. Probabilistic Policy Reuse for inter-task transfer learning. Robotics and Autonomous Systems, 58(7):866--871, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Fernández and M. Veloso. Probabilistic policy reuse in a reinforcement learning agent. Proc. 5th Int. Joint Conf. on Autonomous Agents and Multiagent Systems - AAMAS '06, pages 720--727, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Gaudel and M. Sebag. Feature selection as a one-player game. In Proc. 27th Int. Conf. on Machine Learning (ICML-10), pages 359--366. Omnipress, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How. Online discovery of feature dependencies. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 881--888, New York, NY, USA, 2011. ACM.Google ScholarGoogle Scholar
  10. G. Konidaris and A. Barto. Skill discovery in continuous reinforcement learning domains using skill chaining. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1015--1023. 2009.Google ScholarGoogle Scholar
  11. A. Lazaric and M. Ghavamzadeh. Bayesian multi-task reinforcement learning. In Proc. 27th Int. Conf. on Machine Learning (ICML-10), pages 599--606. Omnipress, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. L. Littman. Memoryless policies: theoretical limitations and practical results. In 3rd Int. Conf. on Simulation of Adaptive Behavior: from animals to animats 3, pages 238--245. MIT Press, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Matos, Y. P. Bergamo, V. F. da Silva, and A. H. R. Costa. Simultaneous Abstract and Concrete Reinforcement Learning. In Proc. 9th Symposium of Abstraction, Reformulation, and Approximation - SARA'11, pages 82--89. AAAI Press, 2011.Google ScholarGoogle Scholar
  14. M. V. Otterlo. Reinforcement learning for relational MDPs. In Machine Learning Conference of Belgium and the Netherlands, pages 138--145, 2004.Google ScholarGoogle Scholar
  15. M. V. Otterlo. The Logic of Adaptative Behaviour. IOS Press, Amsterdam, 2009.Google ScholarGoogle Scholar
  16. C. Painter-Wakefield and R. Parr. Greedy algorithms for sparse reinforcement learning. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), pages 1391--1398, New York, NY, USA, 2012. Omnipress.Google ScholarGoogle Scholar
  17. J. Pazis and R. Parr. Generalized value functions for large action sets. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 1185--1192, New York, NY, USA, 2011. ACM.Google ScholarGoogle Scholar
  18. M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Stulp and O. Sigaud. Path integral policy improvement with covariance matrix adaptation. In Proc. 29th Int. Conf. on Machine Learning (ICML-12), abs/1206.4621, New York, NY, USA, 2012. Omnipress.Google ScholarGoogle Scholar
  20. Y. Sun, F. Gomez, M. Ring, and J. Schmidhuber. Incremental basis construction from temporal difference error. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 481--488, New York, NY, USA, 2011. ACM.Google ScholarGoogle Scholar
  21. A. Tamar, D. D. Castro, and R. Meir. Integrating partial model knowledge in model free RL algorithms. In Proc. 28th Int. Conf. on Machine Learning (ICML-11), pages 305--312, New York, NY, USA, 2011. ACM.Google ScholarGoogle Scholar

Index Terms

  1. Speeding-up reinforcement learning through abstraction and transfer learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
        May 2013
        1500 pages
        ISBN:9781450319935

        Publisher

        International Foundation for Autonomous Agents and Multiagent Systems

        Richland, SC

        Publication History

        • Published: 6 May 2013

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        AAMAS '13 Paper Acceptance Rate140of599submissions,23%Overall Acceptance Rate1,155of5,036submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader